How Many Nodes are Effectively Accessed in Complex Networks? 
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The measurement called accessibility has been proposed as a means to quantify the efficiency of the 
communication between nodes in complex networks. This article reports important results regarding 
the properties of the accessibility, including its relationship with the average minimal time to visit 
all nodes reachable after h steps along a random walk starting from a source, as well as the number 
of nodes that are visited after a finite period of time. We characterize the relationship between 
accessibility and the average number of walks required in order to visit all reachable nodes (the 
exploration time), conjecture that the maximum accessibility implies the minimal exploration time, 
and confirm the relationship between the accessibility values and the number of nodes visited after 
a basic time unit. The latter relationship is investigated with respect to three types of dynamics, 
namely: traditional random walks, self-avoiding random walks, and preferential random walks. 
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I. INTRODUCTION 

A critical issue in the study of complex systems regards 
the interdependency between connectivity and dynam- 
ics [ll-ll]- For instance, given a specific network topol- 
ogy, it would be interesting to be able to predict how it 
would behave with respect to several types of dynamics. 
It has been shown, for example, that reaction-diffusion 
dynamics spreads more quickly in scale free complex net- 
works [3] than in uniformly random networks. Also, con- 
sensus dynamics tends to converge faster in small world 
topologies @. A possible way to address this problem is 
to obtain meaningful measurements of the network topol- 
ogy and then try to correlate them with relevant proper- 
ties of the dynamics. This analysis can be performed at 
local or global level, which provide complementary char- 
acterization of the studied relationship between structure 
and dynamics. 

Particularly important types of dynamics include com- 
munications, flow, and diffusion [6-9]. Several real- world 
complex systems are underlain by this type of dynamics, 
including accesses to WWW pages 10], disease spread- 
ing [llj], power distribution collapse [l2j|. underground 
and highways systems [l]| EH- Frequently, the activa- 
tion of these systems starts at a specific node, or set of 
nodes - henceforth called sources, and unfolds into the 
remainder of the network in ways that are intrinsically 
dependent on the network topology More specifi- 

cally, it would be desirable to quantify how effectively a 
given source can influence the overall network dynamics. 
By 'effectively' it is meant the time that is required for 
the activation to reach specific levels at a given set of 
nodes, or the total activation at such a set after a given 
period of time. These concepts are closely related to the 
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so-called coupon-collector problem [l6|, [TtJ : given a num- 
ber of coupons (i.e. nodes), each with a respective proba- 
bility of occurrence, how many attempts will be required, 
in the average, until all coupons are obtained? Alterna- 
tively, it is also important to identify how many nodes 
will be accessed after a given period of time. The cur- 
rent work addresses these problems through the concept 
of accessibility [18I ]. which quantifies, for a given source 
node, the number of effectively accessible nodes at a given 
distance and with respect to a specific dynamics. In this 
sense, this measure complements the traditional hierar- 
chical degree (l9j . providing valuable information about 
the network structure. Note that the accessibility takes 
into account not only the number of nodes at a given dis- 
tance, but also the transition probabilities between the 
source and these nodes. 

The potential of the accessibility to provide valuable 
insights about the structure and dynamics of complex 
networks has been confirmed with respect to many appli- 
cations (Section Hi)) , including the definition and identifi- 
cation of the borders of complex networks [20|. However, 
some important aspects of this measurement remained 
to be formalized in a more comprehensive fashion. For 
instance, how is the accessibility related to the minimum 
average time required for accessing all reachable nodes? 
Or, in which sense does the accessibility quantify the 
number of effectively accessed nodes? To answer these 
important questions in a satisfying way constitutes the 
main objective of the present article, as this paves the 
way not only to more complete interpretations of the ob- 
tained results but also to different types of applications 
and interpretations. In particular, we show that the ac- 
cessibility can be interpreted in conceptually meaningful 
way as being related to the number of nodes that can be 
visited along a given period of time. 

This work starts by revising the several applications of 
the accessibility already reported in the literature. Then, 
we define and illustrate the accessibility concept, follow- 
ing by establishing the relationship with the coupon col- 
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lector problem and showing that the accessibility is re- 
lated to the number of nodes effectively accessed after a 
period of time. 



II. APPLICATIONS 

Several different applications have been reported by 
using the accessibility concept. For instance, it has been 
shown [lH that, in geographical networks, nodes located 
close to the peripheral regions have lower values of ac- 
cessibility. By extending this result to non-geographical 
networks, it has been possible to define the border of 
complex networks as the set of nodes with accessibility 
smaller than a given threshold value [20]. Moreover, re- 
cent investigations have showed that the position of nodes 
(inside or outside borders) drastically affects the activity 
of nodes [U Other applications unveiled correla- 

tions between the accessibility and real-world properties 
of nodes. Particularly, in [23J the authors investigated the 
network obtained from the theorems in the Wikipedia. In 
such a network, each theorem is a node and two nodes 
are connected whenever a hyperlink is found between the 
theorems. The results indicate that the older theorems 
have higher accessibility values, while newer theorems ex- 
hibit lower accessibility values. Consequently, new theo- 
rems are located at the periphery of the network, defining 
the frontier of the mathematical knowledge. The acces- 
sibility has also been used to investigate the effects of 
underground systems on the transportation properties of 
large cities. It was showed that overall transportation 
can be enhanced by incorporating the underground net- 
works 24]. These results were obtained for the London 
and Paris transportation networks. 



III. THE EFFECTIVE NUMBER OF 
ACCESSIBLE NODES 

Given a source node i, suppose it is possible to reach 
Ni(h) different nodes by performing walks with length h 
departing from i. Then, we say that i has iVj reachable 
neighbors at distance h. Each neighbor is reached with 
a different probability, which is represented by the vec- 
tor = {pj ,P2 \ — iP^Jfh)}- Given this vector, the 
accessibility of the node i, at scale h, is defined as: 



Ki(h) = exp | - logPj' l) 



(1) 



Accessibility values are in the range [l,Ni(h)), the 
maximum being obtained for the homogeneous case, 
when all probabilities have the same value 1/Ni(h). This 
measurement, which is related to the heterogeneity of 
the vector p, provides a generalization of the classical 
concept of hierarchical (or concentric) degree [l9| , as ex- 
plained in Figure [TJ The hierarchical degree of a source 



node i, at distance h is defined as ki(h) = Ni(h), i.e. it is 
the number of nodes which are at distance h from node 
i. It is important to note that the value of ki(h) does 
not take into account a dynamical process or respective 
edge weights in the case of weighted networks. The ac- 
cessibility generalizes the concept of hierarchical degree 
by considering that a specific dynamics is unfolding in 
the network. We show in this article that the accessi- 
bility can be understood as kind of effective hierarchical 
degree. 
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FIG. 1: (a) Hierarchical (or concentric) organization around 
the source node (node i). (b) Homogeneous case, where all 
neighbors are reached with the same probability, (c) Het- 
erogeneous case, where one node has higher probability to 
be reached. Accessibilities for the (d) homogeneous and (e) 
heterogeneous case. 



In Figure [IJa), we show the hierarchical levels around 
the source node i up to the distance h. The network 
topology, as well as a type of random walk adopted, will 
define the transition probabilities, i.e. the components 
of the vector p. In [Ub) and [IJc) we represent these 
probabilities by using different widths for the edges. Ob- 
serve that in both cases, the source node is able to reach 
Ni(h) = 3 nodes. In the first case, all nodes have the 
same probability, while in the second case one of the 
nodes has higher probability than the others. It means 
that, in the first case, the source node accesses its neigh- 
bors in a more uniform manner, which yields an accessi- 
bility value equal to 3, as showed in Figure Hid). O n the 
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other hand, the interaction between the source and its 
neighbors in the second case is biased to a given node, 
which decreases the effective hierarchical degree to al- 
most 1.9, as showed in Figure [He). 

It is important to note that the idea of measuring the 
heterogeneity among first-neighbors nodes in weighted 
networks was previously proposed in 12 q. 26| , with the 
so-called disparity. More recently, in [27 1 the authors 
showed a generalization of this measure, namely the 
Renyi disparity, which is based on the Renyi entropy. In 
a particular case, the Renyi disparity uses the Shannon 
entropy in order to quantify the heterogeneity of weights 
attached to the edges of a node . This particular case has 
a similar equation to [TJ However, in our case, we con- 
sider not only the first-neighbors, but all nodes that can 
be reached at distance h by a specific dynamic. In this 
sense, our approach can be also applied to non- weighted 
networks, since we consider the transition probabilities 
instead of the edge weights. 

Another way to think about the interaction between 
a source node and its neighbors is by considering the 
coupon collector problem. This problem [l|| [TtJ deals 
with the following question: in the average, how many 
walks with length h departing from i are required in order 
to visit all neighboring nodes of i after h steps at least 
one time? We will call this quantity exploration time of 
the node i and denote it by Ti{h), since we can consider 
the displacement velocity through the network constant. 
Then, the number of walks is proportional to the time 
needed to visit all Ni(h) nodes. This problem can be 
mapped into a Poisson problem [28| with independent 
variables, which yields the expression [2] 



i(h) = 




(2) 



A conjecture has been proposed [17j,[29| that r, reaches 
its minimum value for the homogeneous case, where all 
neighboring nodes are reached with the same probability, 
i.e. Pj = l/Ni(h) for any j. In this case, it is not 
difficult to show that Equation [2] can be rewritten as: 



Ni(h) J2 — 



(3) 



Therefore, by using the conjecture cited above, we can 
say that the accessibility is maximum whenever the ex- 
ploration time is minimum. This characteristic is illus- 
trated in Figure [3J which shows a scatter-plot between 
the accessibility, k, and the exploration time, r(h), for 
10 5 randomly generated vectors p with length N — 6. In 
this plot it is also shown a set of important curves, which 
provides a more comprehensive characterization of the 
probabilities configuration. They correspond to the spe- 
cific cases where exactly n probabilities have a value e, 



while all the others (N — n) probabilities are also identical 
between themselves (so that the sum of all these prob- 
abilities becomes equal to one). Therefore, the straight 
line is related to the case where n = 1, so that N— 1 prob- 
abilities have the same value. Also, this line corresponds 
to the bounding value of the accessibility as a function 
of r(h), meaning that all the possible configurations of p 
are enclosed by this curve. The dashed line corresponds 
to the configurations where n = 2. Similarly, the dot- 
ted line corresponds to the situations where n — 3, in 
this case, half of each of the probabilities are equal be- 
tween themselves. One can use the parametrization e in 
Equations [1] and [2] in order to obtain a general equation 
(indexed C) characterizing these curves: 
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where p = (1 — ne)/(N — n). Observe that e lies in 
the interval [0, 1/n]. When e < l/N, the upper part 
of the curves is obtained. In this case, we have kc — S- 
N — n and tq — >• oo for e — > 0. For e > 1/N, we have 
the bottom part of the curves, for which kc — > n and 
tq — > oo, when e — > 1/n. When e = 1/N, we reach the 
homogeneous case, where the accessibility is maximum 
and the exploration time is minimum. 



A. Probabilities in Uniformly Random Networks 

Now we investigate the coupon collector problem in 
uniformly random networks. More specifically, we used 
5000 realizations of the Erdos-Renyi model with 200 
nodes and average degree 4, and then derived the tran- 
sition probabilities from these respective networks. We 
adopted random walks originating from each of the nodes 
in the networks so as to obtain the respective transition 
probabilities (the set of p's) by using the powers of the 
transition matrix [30l ]. 

Figure [3] presents the distribution of the cases in the 
kxt space. This result takes into account all cases where 
the number of accessible nodes, Ni(h), is equal to 10 for 
values of h in the interval [2, 15]. The gray levels corre- 
spond to the density of cases. Remarkably, the density 
is highly skewed towards the lower bound of the e curve, 
and virtually no cases are obtained for the upper half of 
the probabilities region. This means that it is extremely 
unlikely to obtain probability configurations having the 
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FIG. 2: Scatter-plot between the accessibility and the explo- 
ration time for 10 5 random vectors with length TV = 6. The 
lines correspond to the cases where the probabilities are di- 
vided in two groups having the same values among themselves 
as described by Equations [4] and [5] with parametrization e. 



majority of nodes with higher probability, as illustrated 
in Figure [2j 

However, it is possible to obtain configurations which 
occupy the upper boundary region in the n x r space, 
where the minority of the probabilities have smaller val- 
ues. Figure HJa) presents a particular situation exempli- 
fying this case considering an artificial network with N 
nodes consisted by two groups: i) a highly connected ER 
component with (N — n) nodes and average degree (k) c ; 
and ii) n loosely connected nodes with n c (n c <C {k) c ) 
links to the previous subgraph. This topological division 
implies that the nodes in the ER component will be much 
more accessed than the others when considering random 
walks in this network, irrespective of the starting node 
and the length h. Thus, in the case of n <C N , the proba- 
bility vectors, p's, will have the majority of their compo- 
nents with higher values, thus occupying the upper region 
of the k x r space. This property is verified for the sim- 
ulations presented in Figure |4jb) through randomwalks 
departing from each node for values of h (varied from 2 to 
15) where all nodes are reachable. We considered a single 
realization of the network with N — 100 and ER compo- 



nent with (k) c equal to 50. It was assumed n — 2 (empty 
symbols) and n — 20 (filled symbols) with n c links, vary- 
ing from 1 to 30, as indicated in the figure. Observe that, 
as the value of n c decreases, the n nodes become less ac- 
cessible and the points move away from the origin (the 
homogeneous case), as expected. Although we assumed 
that the single nodes are directly connected to the ER 
component, this example can be immediately extended 
considering the presence of tails of nodes with different 
sizes. While this network can be artificially created, ob- 
taining similar results for the occupation of the k x t 
space, it has been showed [3l[ that tails are unlikely to 
occur in great variety of real networks, even for tails with 
short size. Results for real networks will be shown in the 
next section. 

Figure [5] complements the characterization of the kxt 
space. It shows: (a) the local average number of steps 
necessary to reach 10 nodes after departing from the 
source node; (b) the degree of the source node; and (c) its 
eigenvector centrality obtained for the probability config- 
urations. It is clear from Figure [Sf a) that random walks 
with larger number of steps (i.e. h) tend to have smaller 
accessibility and longer exploration time. On the other 
hand, random walks starting from nodes with larger de- 
gree (Figure O^b)) tend to have larger accessibility and 
shorter exploration times, though in a less definite fash- 
ion than that observed in Figure [Sf a). Furthermore, Fig- 
ure E^c) shows a remarkable centrality pattern: it tends 
to increase with k while decreasing with r, apparently 
following the level set curves in Figure [2j It should be 
observed that these results are specific for the uniformly 
random ER networks, in the sense that different trends 
may be obtained for other theoretical network models. 



5 



(a) 



Highly connected component 
(N-n) nodes 




1— 1 1 — I PTTT| 1 1 1 — I I I I I | 1 1 1 — 

10 3 10 4 

r 

FIG. 4: (a) Example of a possible configuration where the up- 
per region of the kXt space is occupied, (b) Results obtained 
for random walks in the considered network for n = 2 (empty 
symbols) and n = 20 (filled symbols) and different number of 
connections n c . 



ing accessibilities. 

Now, we proceed to a related problem in which we are 
interested to know how many nodes, in the average, are 
visited during the time interval t, while performing a spe- 
cific type of random walk. This quantity will be denoted 
by rji(t,h)) and it provides information about how the 
network topology around the source node affects the in- 
teraction with its neighbors. After a long time, we expect 
that the source node will be able to visit all Ni(h) neigh- 
bors, i.e. lim^oo rji(t, h)) —> Ni(h), independent of the 
vector . Therefore, we can consider that the value 
of rji(t,h) provides an estimate of the average number 
of visited nodes during a finite time. This is confirmed 
in Figure [7J for the US airlines network [39| and two 
random coun terp arts: Erdos-Renyi model and Configu- 
ration model [38( . In order to obtain the transition prob- 
abilities, we considered three different types of random 
walks: (i) traditional random walk (TRW), (ii) prefer- 
ential random walk (PRW) and (iii) self-avoid random 
walk (SARW). They were estimated for h = £, where 
I = 3 is the network diameter. The TRW and PRW 
dynamics were calculated by using powers of the tran- 
sition matrix, while the SARW was estimated through 
agent-based simulation. This calculation was repeated 
10 6 times for each source node. It should be noted that, 
in the case of SARW dynamics, if an agent cannot pro- 
ceed further, it remains at the final node contributing 
to the probabilities for the next steps [HI]. Then, the 
transition probabilities were used in the equation [1] to 
evaluate the accessibility of the node i. The values of 
rji(t,h) were obtained for each node i as follows: first 
we draw t neighbors of node i at distance h = I accord- 
ing to the obtained probabilities Pj(()- Then we count 
how many different nodes were drawn. The average over 
several realizations gives us an estimate of r)i(t, h). The 
behavior of k versus rj is showed in Figure [7^, [7}d and 
Et for TRW, SARW and PRW respectively. The results 
have been found to be well-fitted by the functional form 
represented in Equation [6j 



B. Real- World Probabilities 

We also considered probabilities obtained from real- 
world networks, namely circuits [32|, power grids [33j j . 
German highways 34 [, protein interactions 13511 ■ e-mails 
36], and co-authorships in network science 37]. The 
probability configurations obtained from these networks 
are shown in Figure |H1 Again, most of the cases tend to 
appear near the lower boundary in the kxt space, which 
is characterized by low exploration time and varying ac- 
cessibility. This is particularly surprising, as it suggests 
a universal asymmetry in real networks in which a few 
probabilities are larger than the others in most configu- 
rations. Therefore, it is interesting to observe that real 
networks, as with the uniformly random topologies, tend 
to minimize the exploration time at the expense of vary- 



rj = a + &log(« + c) 



(6) 



It is interesting to observe in Figure [JJ: that when the 
preferential rules were adopted, the fraction of reached 
nodes was strongly decreased in the US airline and its 
configuration model. This is a direct consequence of the 
degree heterogeneity of these networks. For preferential 
random walk, the walks are biased to pass through nodes 
with high connectivity, so that several possible paths are 
not used. Therefore, the effective number of reached 
nodes is smaller than in the cases where we considered 
self-avoid or traditional random walk. As we can see, this 
phenomena is not observed in ER networks, which have 
a more homogeneous degree distribution. 
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FIG. 5: Local average measurements for the probability configurations obtained for 5000 Erdos-Renyi networks, (a) number 
of steps necessary to reach 10 nodes after departing from the source node, (b) the degree of the source node, and (c) the 
eigenvector centrality of the source node. 
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FIG. 6: Scatter-plot between the accessibility and the explo- 
ration time for 6 real networks considering the cases where 
N = 10 nodes are reached after h = {2, 3, . . . , 20}. 



IV. CONCLUSIONS 

The accessibility concept was introduced recently [l8[ 
as a means to quantify the potential of a node to interact 
with other nodes in a complex network. Given the many 
promising results obtained so far, it became important 
to better understand the accessibility concept, specially 
regarding optimization aspects. The present work fo- 
cused on the investigation of the accessibility regarding 
the coupon collector problem as well as its relationship 



with the average number of nodes visited along a random 
walk during a given time interval. 

A number of remarkable results were obtained about 
the relationship between the accessibility and the explo- 
ration time. First, we have that the minimal exploration 
time is obtained for the maximum accessibility. No rela- 
tionship between these two properties have been observed 
otherwise, i.e. when we consider all possible probabili- 
ties configurations. However, in the cases of uniformly 
random and real-world networks, a stronger correlation 
is verified, with the cases tending to lie near the lower 
boundary in the k x t space. As a matter of fact, there 
is a very low probability of having cases occupying the 
upper half portion of this space. Although this could sug- 
gest some intrinsic impossibility of having such cases, we 
showed at least one type of topology leading to a config- 
uration lying over the upper boundary. This remarkable 
result shows that in all considered networks the transi- 
tion probability configurations tend to be characterized 
by small exploration time at the expense of varying ac- 
cessibilities. 

Regarding the relationship between the accessibility 
and the average number of nodes visited along a ran- 
dom walk during a given time interval, we showed that 
the concept of accessibility can be understood as a gen- 
eralization of the classical degree, in the sense that the 
accessibility quantifies the effective number of nodes that 
can be reached from the source node after a given num- 
ber of steps. In order to confirm this statement, we also 
showed that there is a strong relationship between the 
accessibility and the inverse coupon collector problem, 
which deals with the number of visited nodes in a finite 
time interval. 

Future works could take into a account activations 
originating from multiple nodes as well as how other dy- 
namical properties can be predicted from the accessibil- 
ity values. It would be particularly interesting to identify 
more general theoretical models and real networks capa- 
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FIG. 7: Fraction of reached nodes r)i(t,h) as a function of the node accessibility m for (a) classical random walk, (b) self- 
avoid random walk and (c) preferential random walk. The results are illustrated for the US airlines network and two random 
counterparts: Erdos-Renyi model and the respective Configuration model [3£|. All of them have the same number of nodes 
(N = 332) and the same average degree ((fc) = 12.81). 



ble of covering the k x t more uniformly. 
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